Evaluation of Expressed Sequence Tag Clustering

نویسنده

  • Katryna Cisek
چکیده

Bioinformatics — the application of computer technology to the management of biological information — is essential to deciphering the genetic code of life. Novel approaches to genome sequencing, such as microarray technology, high-performance supercomputing and computational simulations in high-throughput DNA analysis have led to an explosion of genomic data available. Accurate genomic assembly of this data using computational methods has been one of the biggest challenges in bioinformatics. The reason for this is that different algorithms and different parameter settings in the software give inconsistent results for the same dataset. The objective of the following paper is an analysis of the performance of the PaCE (Parallel Clustering of ESTs) algorithm, implemented as genomic assembly software via Expressed Sequence Tag (EST) clustering. For comparison purposes, the PaCE software performance is compared with different algorithms implemented in Glimmer2.12 and GeneMark genomic assemblers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Significance of Global and Local Features in Expressed Sequence Tag: A Clustering Quality Perspective

Clustering of expressed sequence tag (EST) plays an important role in gene analysis. Alignment-based sequence comparison is commonly used to measure the similarity between sequences, and recently some of the alignment-free comparisons have been introduced. In this paper, we evaluate the role of global and local features extracted from the alignment free approaches i.e., compression-based method...

متن کامل

The modified recombinant proinsulin: a simple and efficient route to produce insulin glargine in E. coli

Background: Recombinant insulin glargine, a long-acting analogue of insulin, is expressed as proinsulin in host cell and after purification and refolding steps cleaved to active insulin by enzymatic digestion using trypsin and carboxypeptidase B. Since the proinsulin's B and C chains have several internal arginine and lysine residues, a number of impurities are generated following treatment wit...

متن کامل

Massively parallel expressed sequence tag clustering

Expressed Sequence Tag (EST) sequencing is a highly efficient technique that samples expressed genes required for most cellular functions. While this is a well-studied problem and many software tools have been developed, large-scale EST clustering has previously been pursued through incremental approaches, a pipeline of programs and manual efforts to achieve a modest degree of parallelism. Here...

متن کامل

SEAN: SNP prediction and display program utilizing EST sequence clusters

SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

متن کامل

EST clustering error evaluation and correction

MOTIVATION The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005